General questions before the start:
In order to consider the multivariate data structure at hand when
running a FPCA, a multivariate FPCA (MFPCA) is conducted using the
R-package MFPCA. Now, a single multiFunData
object is created which consists of one funData object
(package funData) which is three-dimensional: the first
dimension represents the grid cells. We have 434, 442, 462 and 465
disturbed grid cells in scenarios Control,
SSP1-RCP2.6, SSP3-RCP7.0 and SSP5-RCP8.5,
respectively. This adds up to 1803 grid cells in total. The second
dimension represents the Year after Disturbance and the third
one the PFT, numbered from 1 to 5. Since we are looking at four
independent runs of the vegetation model LPJ-GUESS, the grid cells
disturbed in each scenario may vary or may coincide. In total, 1577
unique grid cells are disturbed, where 226 of them are disturbed in at
least two scenarios.
The MFPCA is conducted with 10 principal components in order to be
consistend with the univariate FPCAs conducted before for each scenario
and PFT. Note that the MFPCA package does not provide an
implementation of reconstructing the initial curves for
multi-dimensional data.
Questions here: Is there a mathematical or computational reason, why this is not implemented? Would it be worth the effort to implement it myself?
In the following, we are going to see the principal components, now as heatmaps, a plot visualizing the first two principal components and the results of a 4-means clustering.
As a first step, let’s take a look at the principal components themselves. The following table shows the variability in per cent that the first 10 principal components account for in each scenario:
| PC 1 | PC 2 | PC 3 | PC 4 | PC 5 | PC 6 | PC 7 | PC 8 | PC 9 | PC 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Variability | 91.22 | 7.49 | 0.52 | 0.33 | 0.21 | 0.1 | 0.06 | 0.04 | 0.01 | 0.01 |
Note that this time, the values truly represent the accounted variation, not only the relative part. Obviously, the first principal component already captures nearly all of the variability in the data. The screeplot depicted in Figure 1 underlines this result:
In order to investigate changes in the values of the first 10 principal components, the following Figure 2 shows how positive (first plot) and negative deviations (second plot) to the mean act on the PC scores. Note that these plots only cover differences, not the mean itself. Values smaller than the mean are displayed in different shades of blue, while values above the mean are displayed in red.
The interpretation for the first PC might be the following:
So in total, high values in the first component indicate a lower share of Needleleaf Evergreen, a higher share of Pioneering Broadleaf and a higher share of Tundra in the beginning of the study period than the mean. Other Conifers and Temperate Broadleaf do not substantially contribute to the dynamics.
In order to group the curves into clusters, the first 10 PC scores are clustered by a k-means algorithm. In order to evaluate an appropriate number of clusters, Figure 3 shows an elbow plot for \(k \in \{1,...,10\}\):
The elbow plot would indicate an elbow at \(k=2\). However, given the data structure with four scenarios, in the following, \(k=4\) is chosen. The four clusters are rather unbalanced and dominated by two large clusters, namely clusters 3 and 4, as the following table indicates:
| Control | SSP1-RCP2.6 | SSP3-RCP7.0 | SSP5-RCP8.5 | Sum | |
|---|---|---|---|---|---|
| Cluster 1 | 76 | 46 | 56 | 67 | 245 |
| Cluster 2 | 75 | 54 | 63 | 47 | 239 |
| Cluster 3 | 231 | 166 | 142 | 126 | 665 |
| Cluster 4 | 52 | 176 | 201 | 225 | 654 |
| Sum | 434 | 442 | 462 | 465 | 1803 |
In order to get a first impression of the cluster-specific PFT-behavior, Figure 4 shows the PFT-wise mean shares of above ground carbon over the first 100 years of recovery. Note that here, the clusters are derived using the whole time span of 126 years of recovery and all scenarios are combined.
The plot indicates that cluster 1 is mainly driven by Temperate Broadleaf, while Pioneering Broadleaf is the dominant species in clusters 2 and 4. These clusters differ in terms of other PFTs: in cluster 4, after about 60 years of recovery, hardly anything but Pioneering Broadleaf is present, whereas in cluster 2, other needleleafed trees are present as well. Cluster 3 is dominated by Needleleaf Evergreen, followed by Conifers (others). Note that Tundra does not play an important role here.
Theory in science of vegetation states that the vegetation composition after disturbance is already set only shortly after the disturbance. This would mean in statistical terms, that the cluster assignment should be rather similar no matter how many years of recovery after disturbance are considered. Figure 5 shows the cluster assignments for all four scenarios over different years after disturbance:
We can clearly see that the final composition of clusters is already (nearly) reached after about 60 years depending on the scenario. Drastic changes are especially in the first 20 years and the clusters quickly stabilize in their core grid cells.
Note that Lucia could not detect these patterns in her clustering, so she assumed that this mechanism is probably not covered by the simulation model LPJ-GUESS. This result here however could be an indication for that phenomenon in the simulation.
In order to further investigate these patterns, Figure 6 shows the cluster-wise means after different time spans of vegetation. Here, the clusters after 5, 10, 20, …, 120, 126 years are taken and evaluated by their mean share of above ground carbon over grid cells and clusters. This means in particular that in each considered time step, the cluster assignment could be changed (in contrast to Figure 4).
We can see that (depending on the PFT), the dominance of specific PFTs establishes after only a few decades.
Possible next steps: Further investigate the clusters, also considering climate data: are there any thresholds that define the clusters precisely (e.g. above a certain temperature, we are in cluster 4)?
Figure 7 shows the first two principal components plotted against each other. The colors in the left plot visualize the clusters, while in the right plot, they indicate the scenario.
We can see that the clusters found are clearly distinguishable by the first two PCs and that the clusters derived by 4-means find patterns which are beyond the partitioning into scenarios.
Next, to further investigate the cluster assignment, in the following, the clusters are displayed for each PFT and scenario separately. Let’s start with Tundra.
Figure 8 shows the four clusters for scenario Control. Cluster 3 represents the largest cluster with nearly half of the grid cells inside. Cluster 1 covers those curves with a less steep decrease in above ground carbon. Cluster 2 and 4 are rather similar.
Next, Figure 9 shows the equivalent plot for scenario SSP1-RCP2.6. Here, the two dominating clusters 3 and 4 are rather similar, while clusters 1 and 2 cover all grid cells with a (second) peak after 60 years of regeneration.
Figure 10 (scenario SSP3-RCP7.0) shows a similar pattern: there are no substantial differences between the clusters.
Finally, Figure 11 shows the clustered curves for the most drastic scenario SSP5-RCP8.5. Again, no clear differences are visible by visual inspection.
In total, PFT Tundra seems not to be dominant for cluster building.
Second, let’s take a look at PFT Needleleaf Evergreen. Figure 12 shows the clustered curves for scenario Control:
Here, some obvious patterns are detected by the clustering: while clusters 1 and 2 cover mostly mediate shares of above ground carbon, cluster 3, the dominant cluster, represents grid cells with a rather high share. Cluster 4 on the other hand represents those curves with a small peak in the beginning and a rather low share in general.
Figure 13 shows the equivalent plot for scenario SSP1-RCP2.6:
Here, the same patterns as for the Control scenario become apparent: the two dominant clusters 3 and 4 represent very high and very low shares of above ground carbon, respectively, while cluster 1 and 2 cover mediate shares. Note that in contrast to before, now, we have two dominant clusters.
Figure 14 and Figure 15 visualize the clustering for scenarios SSP3-RCP7.0 and SSP5-RCP8.5, respectively. Again, the same dynamics as for the first warming scenario are present in the data.
In total, clusters 1 and 2 are mainly driven by mediate shares of above ground carbon of Needleleaf Evergreen, while clusters 3 and 4 mainly reflect rather high and low shares, respectively. Therefore, PFT Needleleaf Evergreen seems to be an important component in the clustering process.
Figure 16 shows the clustered curves for scenario Control and PFT Pioneering Broadleaf.
Here, clusters 1 and 3 represent rather low shares of above ground carbon, while clusters 2 and 4 cover higher shares. Note that clusters 2 and 4 differ in the increase dynamic, i.e. the increase in share in cluster 4 is more steep and with a higher peak on average than in cluster 2. Cluster 3 is the dominant cluster.
Figure 17 shows the equivalent plot for scenario SSP1-RCP2.6:
The two dominant clusters 3 and 4 represent mostly very low and very high shares in above ground carbon, respectively. The two remaining clusters, clusters 1 and 2, cover moderate low and moderate high shares, respectively.
Figure 18 and 19 show the dynamics for scenarios SSP3-RCP7.0 and SSP5-RCP8.5, respectively. Again, similar patterns are obvious, but with an interior dynamic: the more the temperature rises, the less clear the clustering. For example, cluster 3, which represents mainly rather low shares, tend to have more and more curves with higher shares. The pattern in cluster 2, i.e. the behavior in increasing changes and becomes less uniform.
In total, Pioneering Broadleaf seems to be a major driver of cluster building.
Figure 20 shows the clustering dynamics for scenario Control and PFT Conifers (others). Some minor differences are present: while the dominating cluster 3 and cluster 2 mainly cover grid cells with a high share of above ground carbon, clusters 1 and 4 represent those cells with a maily lower share. Nite that the dynamics in these clusters differ in terms of the increase and timing of the peak.
Figure 21 shows the equivalent plot for scenario SSP1-RCP2.6. Now, the two dominant clusters 3 and 4 cover mainly grid cells with a high and low share of above ground carbon, respectively, while clusters 1 and 2 cover the mediate shares.
Figure 22 and Figure 23 show the same dynamics for scenarios SSP3-RCP7.0 and SSP5-RCP8.5, respectively. The dominating clusters 3 and 4 represent high and low shares of above ground carbon, while the two remaining clusters cover mediate shares.
In summary, PFT Conifers (others) seems to mainly affect clusters 3 and 4, but is a minor factor in clustering in general.
Lastly, let’s take a look at PFT Temperate Broadleaf. Figure 24 shows the clustered curves for scenario Control:
In the Control scenario, only a few grid cells have a non-zero share of above ground carbon. Due to this lack of data, the clustering should not be overrated.
Figure 25 shows the equivalent plot for scenario SSP1-RCP2.6. Clearly, some dependencies are apparent: clusters 1 and 2 comprise of almost all of the non-zero curves, but reflect both a different dynamic in the increase.
With increase in temperature, Temperate Broadleaf becomes more and more important. This is also visible in Figure 24, where the clustered curves for scenario SSP3-RCP7.0 are displayed. In all four clusters there are a couple of non-zero curves, with most of them being in cluster 1, the smallest cluster.
A similar pattern is apparent in the data for scenario SSP5-RCP8.5 depicted in Figure 27. Cluster 1 is the only cluster with representing some dynamic in the curves.
In total, PFT Temperate Broadleaf hardly affects the clustering algorithm.
To conclude and bring together all the results, here, the effects of the PFTs on the clusters are summarized for each scenario.
Control:
Cluster 1: is driven by high shares of Tundra and Conifers (others), by moderate shares of Needleleaf Evergreen and by low shares of Pioneering Broadleaf.
Cluster 2: is driven by high shares of other Conifers, moderate shares of Needleleaf Evergreen and Tundra and of low shares of Pioneering Broadleaf.
Cluster 3: is driven by high shares of Needleleaf Evergreen and other Conifers, by moderate shares of Tundra and low shares of Pioneering Broadleaf.
Cluster 4: is driven by high shares of Pioneering Broadleaf, by moderate shares of Tundra, and by low shares of Needleleaf Evergreen and other Conifers.
SSP1-RCP2.6:
Cluster 1: is driven by high shares of Temperate Broadleaf and by moderate shares of Needleleaf Evergreen, other Conifers, Tundra and Pioneering Broadleaf.
Cluster 2: is driven by high shares of Pioneering Broadleaf and moderate shares of Needleleaf Evergreen, Tundra and other Conifers.
Cluster 3: is driven by high shares of other Conifers and Needleleaf Evergreen, moderate shares of Tundra and low shares of *Pioneering Broadleaf.
Cluster 4: is driven by high shares of Pioneering Broadleaf and low shares of Needleleaf Evergreen and other Conifers.
SSP3-RCP7.0:
Cluster 1: is driven by high shares of Temperate Broadleaf and moderate shares of Needleleaf Evergreen, Pioneering Broadleaf and other Conifers.
Cluster 2: is driven by high shares of Pioneering Broadleaf and moderate shares of Needleleaf Evergreen and other Conifers.
Cluster 3: is driven by high shares of Needleleaf Evergreen and other Conifers and by low shares of Pioneering Broadleaf.
Cluster 4: is driven by high shares of Pioneering Broadleaf and by lows shares of Needleleaf Evergreen and other Conifers.
SSP5-RCP8.5:
Cluster 1: is driven by high shares of Temperate Broadleaf, moderate shares of other Conifers and Pioneering Broadleaf and Needleleaf Evergreen.
Cluster 2: is driven by high shares of Pioneering Broadleaf and mediate shares of Needleleaf Evergreen and other Conifers.
Cluster 3: is driven by high shares of Needleleaf Evergreen and other Conifers and by low shares of Pioneering Broadleaf.
Cluster 4: is driven by high shares of Pioneering Broadleaf and lows shares of Needleleaf Evergreen and other Conifers.
In total, the clusters are especially influenced by PFTs Needleleaf Evergreen, Pioneering Broadleaf and other Conifers, while PFTs Temperate Broadleaf and Tundra seem to be less crucial for cluster forming.